Search - clustering text data

[Other resource] gmeans

Description: gmeans-- Clustering with first variation and splitting 文本聚类算法Gmeans ,使用了3种相似度函数,cosine,euclidean ,KL.文本数据使用的是稀疏矩阵形式. -gmeans clustering with first variation and splitting Gmeans,a text clustering algorithm, uses 3 functions,cosine,euclidean and KL in similarity measuring.Text data are described by sparse matrix.
Platform: | Size: 71569 | Author: 修宇 | Hits:

[Books] Survey of Text Mining - Clustering, Classification and Retrieval

Description: 文本挖掘入门经典。推荐下载！
Platform: | Size: 4256186 | Author: dahae2010@yahoo.cn | Hits:

[AI-NN-PR] gmeans

[Other] DBSCAN_JAVA

Description: DBSCAN算法的JAVA实现,可以在D:\使用text.txt数据文件来进行DBSCAN聚类-DBSCAN algorithm JAVA, the D : \ text.txt data files used for clustering DBSCAN
Platform: | Size: 16384 | Author: 赵发毅 | Hits:

[Database system] KMEANSII

Description: 神经网络中的K均值聚类算法II： 1.KMIn为输入数据文本，其中，第一个参数为所要聚类点个数，第二个参数为聚类点的维数,第三个参数为所要求聚类的个数 2.KM2OUT为经过K均值聚类算法II计算后得到的结果-Neural network in K-means clustering algorithm II: 1.KMIn input data for the text, of which the first parameter to be the number of clustering points, the second parameter is the dimension of clustering points, the third parameter for the clustering the number of requests for 2.KM2OUT after K-means clustering algorithm II calculation results
Platform: | Size: 277504 | Author: blue8202 | Hits:

[Graph program] cluster_algorithm

Description: 包括分解聚类算法和k-均值聚类算法，内有用到的数据文本文件，开发环境Visual Studio .NET2003-Including the decomposition clustering algorithm and k-means clustering algorithm, with useful data to a text file, development environment Visual Studio. NET2003
Platform: | Size: 1020928 | Author: 杨洋 | Hits:

[Windows Develop] text-data-mining

Description: 此程序实现了如何在TXT或WORD文档中进行数据挖掘,在文本中提取有用信息-The realization of this procedure how to TXT or WORD document to carry out data mining, in the text to extract useful information
Platform: | Size: 384000 | Author: sam | Hits:

[JSP/Java] javacluster

Description: JAVA实现文本聚类，用到TF/IDF权重，用余弦夹角计算文本相似度，用k-means进行数据聚类等数学和统计知识。-JAVA realization of text clustering, using TF/IDF weight, calculated using cosine angle between the text of similarity, using k-means clustering for data such as mathematical and statistical knowledge.
Platform: | Size: 1024 | Author: 优优 | Hits:

[JSP/Java] java-cluster

Description: 用java语言实现文本聚类，包括聚类前的数据预处理：分词、降维、建立向量空间模型等-Implementation using java language text clustering, including clustering of the data pre-processing before: segmentation, dimensionality reduction, set up, such as Vector Space Model
Platform: | Size: 17408 | Author: 优优 | Hits:

[GIS program] KAV

Description: KAV是利用Visual C++ 6.0编写的一个小程序，能实现对特定数据结果的文本数据进行聚类分析，所使用的聚类方法是K均值。 -KAV is the use of Visual C++ 6.0 to prepare a small procedure to achieve the outcome of specific data on the text data clustering analysis, the use of the K-means clustering method.
Platform: | Size: 64512 | Author: 随风 | Hits:

[Special Effects] Kmeans

Description: 基于opencv的kmeans聚类实现输入文本数据，进行聚类输出-Opencv-based clustering of kmeans the input text data, clustering output
Platform: | Size: 2260992 | Author: Cindy | Hits:

[Software Engineering] LJClusterDemo

Description: 文本聚类是基于相似性算法的自动聚类技术，自动对大量无类别的文档进行归类，把内容相近的文档归为一类，并自动为该类生成特征主题词。适用于自动生成热点舆论专题、重大新闻事件追踪、情报的可视化分析等诸多应用。灵玖Lingjoin（www.lingjoin.com）基于核心特征发现技术，突破了传统聚类方法空间消耗大，处理时间长的瓶颈；不仅聚类速度快，而且准确率高，内存消耗小，特别适合于超大规模的语料聚类和短文本的语料聚类。灵玖文档聚类组件的主要特色在于： 1、速度快：可以处理海量规模的网络文本数据，平均每小时处理至少50万篇文档； 2、聚类精准：Top N的聚类中心往往能反映出当时的时事热点，适合于舆情热点计算；与国际上以聚类见长的Autonomy公司技术相比，灵玖的各项指标远远领先，或许是灵玖更懂中文吧 3、精准排序：各个类别按照影响权重排序，每个类中的文档按照重要性排序； 4、可定制：可以定制类别数、类别中心。 5、开放式接口：灵玖文档聚类组件作为LJParser的一部分，采用灵活的开发接口，可以方便地融入到用户的业务系统中，可以支持各种操作系统，各类调用语言。灵玖文档聚类可以应用于文本挖掘、知识管理、搜索聚类、舆情监测等多种应用中。 -Text clustering algorithm is based on the similarity of automatic clustering techniques, automatically a large number of non-classified categories of documents, the contents of the documents fall into a similar category, and automatically generate the features for this kind of keywords. For automatic generation of hot topics of public opinion, major news event tracking, information visualization analysis and many other applications. Ling Jiu Lingjoin (www.lingjoin.com) found that based on the core features of technology, a breakthrough of traditional clustering method of space consumption, processing time is long bottlenecks not only the clustering speed and high accuracy, memory consumption is small, is particularly suitable for ultra-large-scale corpus clustering and short text corpus clustering. Ling-Jiu document clustering component of the main features are: 1, fast: the size of the network can handle the massive text data, the average hourly processing at least 50 mil
Platform: | Size: 1100800 | Author: lingjoin | Hits:

[JSP/Java] src_2

Description: an another k means clustering fot clustering the text data
Platform: | Size: 6144 | Author: anu | Hits:

[Algorithm] DocumentSet_rar

Description: files that are very useful as data sets for document clustering.. i have done project based on these document sets for my pg degr-files that are very useful as data sets for document clustering.. i have done project based on these document sets for my pg degree..
Platform: | Size: 35840 | Author: leena | Hits:

[Windows Develop] 1

Description: 基于WEKA平台的文本聚类研究与实现文本聚类是文本挖掘领域的一个重要研究分支，是聚类方法在文本处理领域的应用。本文对基于空间向量模型的文本聚类过程做了较深入的讨论和总结，利用文本语料库，基于数据挖掘工具研究并实现了文本聚类的过程。本文首先给出了文本聚类的思想和过程，回顾了文本聚类领域的已有成果，列举了文本聚类领域在特征表示、特征提取等方面的基础研究工作。另外，本文回顾了现有的文本聚类算法，以及常用的文本聚类效果评价指标。在研究了已有成果的基础上，本文利用20 Newsgroup文本语料库，针对向量空间表示模型，在开源的数据挖掘平台WEKA上实现了文本预处理和k-means聚类算法，并根据实际聚类效果，就文本表示、特征选择、特征降维、等方面提出优化方案。-Text clustering is an important field of text mining research branch, is the clustering in the field of text processing applications. In this paper, based on vector space model for text clustering process to do a more in-depth discussion and summary, the use of the text corpus, based on data mining tools to study and realize the document clustering process. This paper shows the ideas and text clustering process, reviewed the existing text clustering results of the field, citing the field of document clustering in the feature representation, feature extraction and other aspects of basic research. In addition, the paper reviews the existing text clustering algorithm, as well as common text clustering validity. In the study has been based on the results, we use 20 Newsgroup corpus, for the vector space representation model, in the WEKA open source data mining platform to achieve a text preprocessing and k-means clustering algorithm, and according to the actual clustering effect to the tex
Platform: | Size: 1022976 | Author: yueyue | Hits:

[matlab] data-mining

Description: 数据挖掘《机器学习与数据挖掘：方法与应用》，朱明等译，电子工业出版社-data mining conclude Classification,Estimation,Prediction,Affinity grouping or association rules,Clustering,Description and Visualization,Text, Web ,et al
Platform: | Size: 20729856 | Author: 李杰 | Hits:

[SQL Server] DATA

Description: 文本聚类分类数据集包括20newsgroup 和retuers 中抽取的500条数据,有四个表-Text clustering and classification of data sets including 20newsgroup retuers 500 extracted data, there are four tables
Platform: | Size: 5006336 | Author: wanghu | Hits:

[AI-NN-PR] module-1

Description: ector quantization is a classical quantization technique from signal processing which allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. Digital libraries not only consist of text data, but also speech and image data. To compress speech data techniques such as vector quantization (VQ) are used.
Platform: | Size: 6144 | Author: noopur | Hits:

[Other] An-Improved-KNN-Text-Classification-Algorithm-Bas

Description: An Improved KNN Text Classification Algorithm Based on Clustering With the rapid development of internet, a large number of text information begin to exist with the form of computer-readable and increase exponentially. The data and resource of internet take on the character of massive. In order to effectively manage and utilize this large amount of document data, text mining and content-based information retrieval have gradually become the hotspot research field in the world.
Platform: | Size: 180224 | Author: AMIMIMEK | Hits:

[Other resource] Votingkmeans

Description: 基于文本数据的投票k-means聚类融合算法的实现-Voting k-means clustering text-based data fusion algorithm implementation
Platform: | Size: 7168 | Author: dylan | Hits:

« 12 »

Category

Source Code

Web/Internet

Develop Tools

Document

Other

Search in results

OS

Platform

Language

File Type

Search list